Hedged Maximum Likelihood Estimation 
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This paper proposes and analyzes a new method for quantum state estimation, called hedged 
maximum likelihood (HMLE). HMLE is a quantum version of Lidstone's Law, also known as the 
"add /3" rule. A straightforward modification of maximum likelihood estimation (MLE), it can be 
used as a plugin replacement for MLE. The HMLE estimate is a strictly positive density matrix, 
slightly less likely than the ML estimate, but with much better behavior for predictive tasks. Single- 
qubit numerics indicate that HMLE beats MLE, according to several metrics, for nearly all "true" 
states. For nearly-pure states, MLE does slightly better, but neither method is optimal. 
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Quantum state estimation is a basic task in quantum 
information science T , simple to describe but hard to do 
right. The estimator gets N independently and identi- 
cally prepared (i.i.d.) quantum systems, performs mea- 
surements on them, analyzes the data, and reports a 
single-system density matrix p. The goal is to report the 
most "accurate" answer possible. There is room for sub- 
stantial debate over what this means, which the present 
paper will avoid by adopting three common assumptions: 
(i) we are concerned with N copies of an unknown "true" 
state p; (ii) the goal is to get p as close as possible to p, 
according to some metric d{p,p); and (iii) measurement 
outcomes are intrinsically random, and we are concerned 
with average (over measurement outcomes, not over p) 
performance. I will not consider how to choose a mea- 
surement, seeking instead a protocol that works well for 
all measurements. 

Maximum likelihood estimation (MLE) [2H5] , the most 
common protocol, tends to report rank-deficient esti- 
mates with zero eigenvalues '5] . Those eigenvalues repre- 
sent probabilities. Assigning a zero probability indicates 
extraordinary confidence - confidence that the data do 
not support. For predictive purposes, this "zero eigen- 
value problem" can be be disastrous in practice. 

This paper suggests an alternative, hedged maximum 
likelihood. HMLE is a simple modification of MLE that 
can be used as a plug-in substitute for it. The mod- 
ification consists, in its entirety, of the following rule. 
Replace the standard likelihood function C{p) ~ 
Pr(observed datajp) v^rith the product of C{p) and a 
"hedging function" 

h{p) = det(p)^ (1) 

where det( ) is the determinant, and /3 w ^ is a 
positive constant chosen at the estimator's dis- 
cretion. The rest of this Letter is devoted to explaining, 
deriving, and analyzing this procedure. 

Background: HMLE is motivated by a rule for esti- 
mating classical probabilities called Lidstone 's Law [3 [S] 
- or, more colloquially, "add /3". Suppose we have ob- 
served N samples from an unknown i.i.d. distribution 
p = {pi . . -Pk}, and have seen rik "A:"s. What prob- 
abilities p should we assign for the next sample? The 



likelihood, £(p) — rifc-Pfc*"' maximized by the natural 
and obvious estimate 



Pk 



nk_ 
N' 
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This can be disastrous in practice. Suppose some letter 
k has not yet been observed, so rik = 0, and MLE assigns 
Pk = 0. This is fine if pk really is zero, but it's equally 
plausible that pk is positive but small. If it is, the conse- 
quences of this error depend on what the estimate is used 
for. They are catastrophic when the estimate is used for 
predictive tasks, such as data compression or gambling 

Compression and gambling define operational interpre- 
tations of p, and identify relative entropy as a measure 
of error: 



-D(pIIp) ^^Pk (logp/c - logpfc). 



(3) 



A gambler maximizes his bankroll's expected growth rate 
by gambling a fraction pk of it on outcome "fc" , and a 
compressor gets optimal performance by replacing "fc" 
with a codeword of length — logpfc. If the true proba- 
bilities are p and the estimate is p, then the gambler's 
wealth grows as %{n) = %{fi)e<^°'^^^-H(p)-D{p\\t,)) ^ ^j^gj.g 

(p) = — Ylk Pk logPfe is the entropy of p. Similarly, the 
length of the compressor's compressed string grows as 
L = n [-ff(p) + I?(p||p)]. In both cases, i?(p) is the un- 
avoidable cost of p's randomness, while £'(p| |p) is the ad- 
ditional cost of estimating it incorrectly. Setting pk = 
thus implies extreme strategies for gambling (bet the en- 
tire bankroll against "fc") and data compression (map 
"fc" to an infinitely long codeword). Either way, if the 
next letter is "fc" , the consequences are disastrous. 

"Add /3" avoids these catastrophes by hedging against 
as-yet-unseen possibilities. It assigns probabilities 



Pk = 



rik +13 
N + KP' 



(4) 



The lowest probability that can be assigned is 
^. Like 



Eq.|2| 



N+Kfl 



this rule has a statistical derivation. It is 
the Bayes estimator (i.e., it minimizes expected cost) for 
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a relative entropy cost function and a Dirichlet-/? prior 
Po(p)dp«[]pf->fe. (5) 

k 

Common examples of Dirichlet priors include the "flat" 
Lebesgue measure (/? = 1), and Jeffreys' prior (/3 = i). 
Given any prior, we can minimize expected relative en- 
tropy by: (1) updating the prior to a posterior via Bayes' 
Rule; and (2) reporting its mean value. For the Dirichlet- 
/3 prior, this gives the "add /3" rule. 

The "add /3" rule is not intrinsically Bayesian, how- 
ever. A naive estimator following Eq. [2] can simulate it 
by adding l3 dummy observations of each letter k. This 
yields new frequencies {n^ -t- /3} and a total of TV -I- K/3 
observations. To generalize to non-integer /?, we observe 
that the likelihood function is C{p) = Pr({nfc}|p) = 
Yik Pk'° ' ^^'^ adding (3 dummy observations of each letter 
yields a hedged likelihood function 

C'ip)=l[pl''+^=(llrAcip), (6) 

fe \ k / 

whose maximum value is achieved by Eq. HI When /3 
is not an integer, the hedged likelihood (Eq. l6| remains 
well-defined, and the "add ^S" rule still maximizes it. 

Quantum Hedging: The quantum analogue of a 
distribution p is a d x d density matrix p. It cannot 
be observed directly; observing a sample of p requires 
choosing a particular measurement M. Experimental- 
ists often divide the samples into groups and measure 
Mj on the Nj samples in group j, but C{p) depends only 
on observed events, not the unobserved alternatives, so 
we may pretend that all TV samples were measured by 
M- — {JjWjMj, where Wj — M corresponds to a 
POVM, a set of positive operators {Ei} summing to 11, 
which determine the probability of outcome "i" as 

Frit) ^ Tt[pE,]. (7) 

The frequencies {rii} thus provide information about p. 
Interpreting this information is the central problem of 
quantum state estimation. 

The oldest and simplest procedure, linear inversion to- 
mography pj], is based on Eq. [2j Inverting Born's Rule 
(Eq. [7]) yields an estimate Ptomo satisfying 

Tr[ptomo£^i] = ^ for i = 1 . . . m. (8) 

If these equations are overcomplete, ptomo is chosen by 
least-squares fitting. Frequently, some of ptomo's eigen- 
values are negative - a serious problem, for they repre- 
sent probabilities. This occurs because linear inversion is 
blind to the shape of the space of quantum states (which 
assign probabilities to all measurements). It tries to fit 
data from a single POVM A4, and happily assigns neg- 
ative probabilities for measurements that weren't per- 
formed. 



The usual fix for this problem is MLE [2] . A likelihood 
function is derived from the data, 

Cip)^FTi{n,}\p)^l[Tr[pEr\ (9) 
i 

and we assign the p that maximizes it. Maximization 
over all trace-1 Hermitian matrices yields ptomo (just as 
in the classical case), but restricting to p > yields a 
non- negative pmle- 

However, pmle can still assign zero probabilities - 
just like its classical counterpart (Eq. [2]). If ptomo is 
not strictly positive, pmle will have at least one zero 
eigenvalue [5], so this is rather common. Moreover, the 
zero probabilities in pmle are less justified than those 
in Pmle, because they generally correspond to a mea- 
surement outcome ['(/')(''/' I that is not an element of the 
measured POVM, and could never have appeared. In 
contrast, Eq. [2] assigns pk = only when "fc" has been 
given N chances to appear and (so far) has not. So al- 
though Pmle may be the right estimator for some task, 
its zero eigenvalues represent a level of confidence that is 
implausible and (for predictive tasks like gambling and 
compression) catastrophic. Prediction demands a hedged 
estimator. 

Bayesian mean estimators are hedged, and with suit- 
able priors they have extremely good predictive behavior 
[B] . However, for quantum estimation there are no closed- 
form solutions, and numerical integration is hard. This is 
unfortunate, because Bayes estimators for classical prob- 
abilities work very well. They yield "add /?" rules when 
applied to Dirichlet-/? priors, and Dirichlet priors are well 
motivated. Jeffreys' prior (/3 = i) yields asymptotically 
minimax-optimal estimators for data compression |12) . 
Krichevskiy showed that "add 0.50922 . . ." outperforms 
all other rules for predicting the next letter [TJ], and 
Braess et al [M] pointed out that /? ss 1 generally works 
well because large-A'^ behavior depends only weakly on 

This suggests adapting "add /3" to quantum state esti- 
mation (independent of Bayesian arguments). However, 
obvious methods like adding dummy counts don't work. 
Suppose we estimate a qubit source by measuring a^, 
ay, and ten times each, and - by unlikely chance - all 
the outcomes are -1-1. ptomo lies well outside the Bloch 
sphere, and Pmle is the projector onto its largest eigen- 
vector. Now, if we add /3 = 1 dummy counts, ptomo is 
still outside the Bloch sphere, and p'^i^^ is unchanged! 

The underlying problem is that MLE tries to fit the 
observed data, with no consideration of unobserved mea- 
surements - but the resulting quantum state makes pre- 
dictions about those unobserved measurements. Adding 
dummy data works in the classical case because there 
are only K different events that can be observed or pre- 
dicted, so by adding a dummy observation of each one, 
we rule out the possibility of assigning = to any 
event. A quantum state assigns probabilities to infinitely 
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FIG. 1: Methodology: 10^ single-qubit states ptruc were selected at random from the Hilbert-Schmidt ("flat") measure on the 
Bloch sphere. For each state, 10'^ separate datasets were generated, each consisting of 3N (N — 10, 100, 1000) measurements 
divided among the three Pauh operators. HMLE estimates (with several /3 values) were calculated. For each ptruo, relative 
entropy error was averaged over all 10^ datasets. Results: Error is strongly correlated with = |(l + Trp^). There are three 
regimes, separated by 1 — « \/?>/N (dotted line). (1) for mixed states with 1 — ^ ^Jji/N , accuracy increases slightly with 
the amount of hedging (quantified by /3). (2) for slightly mixed states with 1 — ~ ^J3/N, accuracy improves substantially 
with hedging, but only up to /3 « |. (3) for nearly-pure states with 1 — <C \/3/N, a small amount of hedging improves 
accuracy, but higher increases error, and the optimal /3 decreases with N. 



many different events (measurement outcomes), and a fi- 
nite set of dummy observations cannot bound all of these 
probabilities away from zero. 

Instead, HMLE modifies the likelihood function di- 
rectly, multiplying it by a unitarily invariant hedging 
function (Eq. [I]) that is independent of what POVM was 
measured. This modification is directly analogous to the 
one generated by dummy counts in Eq. |6j because det(p) 
is the product of p's eigenvalues. In both cases, hedging 
makes very small probabilities less attractive, steering 
the maximum of £'(•) away from boundaries. When the 
data are all drawn from a single classical basis (i.e., Ai is 
a projective measurement), HMLE reproduces the "add 
/3" rule exactly: if outcome |fc)(A:| was observed rife times, 
then the HMLE estimate is 

k 

Eq. [T]is the only measurement-independent smooth mod- 
ification of C{p) that yields "add /3" for every basis (see 
Appendix B of the arxiv.org version for proof). 

Performance: The point of HMLE is to give more ac- 
curate estimates than MLE. "Accuracy" depends on the 
measure of error, but HMLE is motivated by the idea that 
a state should be predictive, and predictive tasks (e.g., 
data compression and gambling) suggest that quantum 
relative entropy is a good measure of inaccuracy. This is 
a bit unfair to MLE. If p is rank-deficient on p's support, 
then D{p\\p) = 00. Since every true p has some nonzero 
probability of serving up measurement results that yield 
a rank-deficient pmle, the expected value of D{p\pmle) 
is always infinite. What we can do is compare differ- 
ent hedging parameters. Figure [T] shows relative-entropy 
error for /? = 10~^, 10~^, |, applied to a single qubit mea- 
sured N = 10, 10^, 10^ times in each of the Pauli bases. 



The error depends on p, most strongly on its ra- 
dial coordinate r = .^ i+^rp "^ Three regimes are evi- 
dent. (1) For highly mixed states (1 - > ^/S/N), 
where MLE rarely yields rank-deficient estimates, accu- 
racy increases slowly with /3. (2) For slightly mixed 
states (1 — « yJi/N), where MLE frequently yields 
a zero eigenvalue, accuracy improves dramatically with 
increased /?, up to /3 « 1/2. (3) Nearly-pure states 
(1— <C ~^^) display unexpected and complex behavior. 
Optimal accuracy is achieved by a very small amount of 
hedging that decreases with N as /Soptimai ~ 2^1?' 
yond this point, more hedging leads to greater inaccuracy 
- most noticeably for pure states. 

Other error metrics include Euclidean distance 
(v/Tr[(p-p)2]), infidelity (1 - [Ti^^pa^f), and 
trace distance (Tr|p — a\) jT]. Each of these metrics has 
its purpose, but none of them are particularly appropri- 
ate for comparing p to p. Nonetheless, since they are 
widely used. Fig. [2] illustrates their behavior for MLE 
and HMLE, applied to a single qubit measured in the 
Pauli bases. Both Euclidean/trace-norm distance (they 
are equivalent for qubits) and infidelity show the same ba- 
sic behavior. For nearly pure states, MLE is more accu- 
rate. For highly mixed states, HMLE improves accuracy 
slightly. The biggest improvement comes in the interme- 
diate regime where 0{l/N) < l-r^ < 0(1/^^). These 
states are not quite pure, but close enough that MLE 
yields rank-deficient estimates a substantial fraction of 
the time. In this regime, hedging provides substantial 
improvement. So, even though HMLE is not designed to 
maximize fidelity or trace-distance, it improves on MLE 
for all but the purest states. 

Discussion: There is nothing sacred about the max- 
imum of C{p), but ph should not be significantly less 
likely than pmle. Likelihood measures plausibility, and 
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FIG. 2: Methodology: See Fig. [T] MLE and HMLE 
(with several /3 values) estimates were calculated, and for each 
Ptruc, Euclidean distance and infidelity were averaged over 
all datasets. (Note that the trace and Euclidean distances 
are equivalent for qubits: \\p — p\\i = \/2||p — p\\2-) Re- 
sults: The same three regimes are apparent as in Fig. [l] For 
highly mixed states, hedging provides a small but consistent 
improvement. Slightly mixed states see substantial improve- 
ment from hedging. For nearly-pure states, hedging decreases 
accuracy regardless of /3. However, nearly-pure states are 
typically estimated with greater accuracy than more mixed 
states, so the best overall performance is achieved by hedg- 
ing. /3 — 0.25 ... 1 seems to be optimal; for /3 > 1, the error 
in pure state estimation outweighs benefits for mixed states. 



if £(/5h) is almost as large as /C(/5mle), then pn is al- 
most as plausible as pmle. If they have identical proper- 
ties, we may as well pick the more plausible one - but if 
Ph is substantially different in some way, then it should 
be considered on its merits unless pmle has significantly 
higher likelihood. We have already seen that the HMLE 
estimate has substantially different properties, so let's 
confirm that it is not significantly less likely. 

Consider the classical case. If = 0, then Pmle 
assigns pk — 0. But it's equally plausible that pk > 0, 
since if pfe < ^ then "fc" probably won't appear in the 
first N samples. The likelihood function bears this out: 
the most likely state sets pk = 0, but nearby states with 
nonzero pfe have almost the same likelihood. If Ph assigns 



Pk 



f and p;- ^ (l - A) ^ for j ^ fc, then 



.^(Pmle) 



1 - — 



AT 



(11) 



Likelihood ratios between and e are "barely worth 
mentioning" [15], so if /3 < 1, then pn is essentially 
as plausible as Pmle- Actually, Pmle comprises K — 1 
independent parameters, and in this case likelihood ra- 
tios between e^^ and are insignificant. [Typically, 



/^(PtlT 



-K 



'C(pmle), so tighter significance criteria 



would reject the true state.] If Pmle assigns zero prob- 
ability to M < K different events, and pn hedges all M 
of them, then the argument leading to Eq. [TT] gives a 
likelihood ratio of e~^^ , which is not significant. 

For quantum HMLE, it's possible to show the same 
result for the HMLE estimate pn: 



^(pmle) 



> e 



-dp 



(12) 



The proof is a bit long, and can be found in Appendix A 
of the arxiv . org version. 

Conclusions: Hedging is a simple, well-motivated so- 
lution to the zero eigenvalue problem. It is also easy 
to implement - unlike, for instance, Bayesian techniques 
(which tend to be an order of magnitude harder to cal- 
culate). HMLE can be implemented by a near-trivial 
change to any MLE routine. Because the hedged likeli- 
hood goes smoothly to zero near the boundary, no ex- 
plicit positivity constraint is needed. So in fact it may 
be easier than MLE, as simple gradient-crawling meth- 
ods should work (though care is necessary for small /?, 
where the boundary roll-off becomes sharper). 

Because pu is always full-rank, it can safely be used 
for predictive tasks like gambling and data compression. 
HMLE works well for qubits, providing improved accu- 
racy by almost all metrics. Further studies will reveal 
how well it works for larger systems. Pure states are 
best estimated using very small values of j3, and in gen- 
eral the optimal value of f3 is not clear. This contrast with 
the classical case, where /3 « | is known to be asymptoti- 
cally optimal, suggests that alternative hedging functions 
(which do not correspond to "add /?") may work better 
for quantum estimation. 
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Appendix A 

The point of this section is to demonstrate that the 
hedged maximum likelihood estimate pn is never signifi- 
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cantly less plausible than the MLE estimate, i.e. 



^(/5mle) 



> e 



-dp 



The MLE estimate maximizes the log-likelihood 
(l{p) = log£(p)), while the HMLE estimate maximizes 
the hedged log-Ukelihood {l'{p) = l{p) + \ogh{p), where 
h{p) is given in Eq. Both are convex functions on 
a convex subset of M'^ It is convenient to think of 
—l{p) and —l'{p) as potential energy functions, and of 
Pmle and /5h as the corresponding equilibrium states. In 
this picture, the gradients Vl{p) and Vlog/i(p) are force 
fields (which balance perfectly at pn), and the logarithm 
of the likelihood ratio 

I0g(^^)=/(PMLE)-;(PH) = A^ 

is the amount of work done by V log h{p) by adiabatically 
changing the equilibrium from /Omle — > Ph- 

Because h(p) depends only on p's eigenvalues, the cor- 
responding "force" 

Vlog%) = /3p-i 

is orthogonal to unitary rotations, and acts only on the 
spectrum of p. Furthermore, while it diverges at the 
boundary, it becomes rapidly and monotonically weaker 
away from the boundary. So, although it inexorably 
forces p off the boundary, it does not necessarily push 
it very far. 

Let us imagine that the hedging parameter (denoted 
/3') is adiabatically increased from zero to /3. For each /?', 
there is an equilibrium ppi. Increasing /3' by d(5 shifts it 
a distance Ap and does work 



'Vl-dp^ Vlog/i|^/ • dp. 



Integrating V log h\p> - dp along the entire path yields Al. 
Since V log h is orthogonal to unitary changes in p, the 
integral is only sensitive to motion within the eigenvalue 
simplex, so 

Vlog%,.45 = ^^dA.. 

k ^ 

It's tempting to evaluate this directly as 



/ - ^'^ 

(9 — iOlVTT.R J.. 



dX, 
At 



but /?' changes with p. Instead, we upper-bound the in- 
tegral by observing that —l{p) and — \ogh{p) are both 
concave, so their second derivatives are strictly positive. 
As pp' moves away from the maximum of l{p) and to- 
ward that of log h{p), the components of — and V log h 
parallel to 45 are strictly increasing. Thus, substituting 



V log h evaluated at pu into the integral yields an upper 
bound. Defining the eigenvalues of pmle as X\ and those 
of ph as A{, 



A/ < 



P— Pmle /j. 
d 



, A. 



A°. 



k=l 
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This means that /C(ph) is at least e"'^^ C{pm'l^), so for 
/3 < 1 it is not significantly less plausible. 

This does not necessarily mean that pu is close to 
Pmle- When £(p) is nearly flat, hedging can cause sub- 
stantial deflection - precisely because there is no gradi- 
ent in C to oppose it. When C is sharply peaked around 
Pmle J hedging has comparatively little effect. 

Appendix B 

The point of this section is to show that the hedging 
function given in Eq. [l] 

h{p) = det(p)^ 

is the only smooth hedging function that reproduces the 
"add /3" rule for measurements in any single orthonor- 
mal basis. That is, when N i.i.d. d-dimensional quan- 
tum systems all have been measured in a single ba- 
sis denoted {|0) . . . |(i — 1)}, and outcome \k){k\ has ap- 
peared rifc times, the maximum of the hedged likelihood 
C = h{p)C{p) should be 



Ph 



^ TV 

k 



nk+ P 
dp 



First, consider hedging according to Eq. [T] 
hedged likelihood is 



(13) 



The 



C'ip)^det{pfl[{k\p\kr''. 

k 

It's equally valid (and more convenient) to maximize its 
logarithm, 

log £' (p) = /3 log det (p) + ^ nfc log ( A: I p I fc) . 



This function's gradient thus has two components, one 
from the likelihood and one from the hedging function. 
The likelihood depends only on the diagonal elements of 
p, so its gradient is orthogonal to off-diagonal variations. 
The hedging function is unitarily invariant, so its gradi- 
ent is orthogonal to unitary rotations. If we vary only 
over diagonal matrices p = then this prob- 

lem reduces to the classical one and it's easy to show that 
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Eq. [Tsjis the maximum. Furthermore, this is a local max- 
imum (with respect to all variations) , because the gradi- 
ent of h{p) is orthogonal to unitary rotations and there- 
fore locally orthogonal to off-diagonal variations. This 
is also a global maximum, because log £'(/?) is a convex 



means that h{p) = h'{p), since smooth functions whose 
derivatives agree at a dense set of points are identical. 



function. Thus, Eq. 13 maximizes the hedged likelihood. 

Now, consider some other hedging function h!{p). If 
h' {p) is not unitarily invariant, then there exists some 
point p such that, in the neighborhood of p, the gradi- 
ent of h! {p) is not orthogonal to unitary rotations. Sup- 
pose that the measured basis {|fc)} is the diagonal basis 
of p, and the measured frequencies are such that pn 
(given by Eq. 131 is in the neighborhood of p. Then, 



at the point pn, the gradient of C{p) is orthogonal to 
off-diagonal variations, but the gradient of h' {p) is not. 
This means that the gradient of the hedged likelihood 
does not vanish, and thus its maximum cannot coincide 
with Eq. [13| 

If h' [p) is unitarily invariant, then for every measured 
basis, the maximization can safely be restricted to diago- 
nal matrices p = pk\k){k\, and it reduces to a classical 
problem, maximizing 

iog/:'(p) = iog/i'(p) + iog/:(p). 

To reproduce the "add /3" rule, the gradient of log£' 
must vanish - for all {uk} - at pn = | "'jv ^ |- This 
implies 

viog/i'ipH = -viog/:|pH, 

and since this condition is automatically satisfied by Eq. 



Vlog/i'lp, = Vlog/i| 



PH 



at every point pn = | "'jv | ^'-'^ {"fc}- These points 
are dense in the simplex on which h'{p) is defined. This 
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